Latent Semantic Indexing: An overview

نویسنده

  • Barbara Rosario
چکیده

Typically, information is retrieved by literally matching terms in documents with those of a query. However, lexical matching methods can be inaccurate when they are used to match a user's query. Since there are usually many ways to express a given concept (synonymy), the literal terms in a user's query may not match those of a relevant document. In addition, most words have multiple meanings (polysemy), so terms in a user's query will literally match terms in irrelevant documents. A better approach would allow users to retrieve information on the basis of a conceptual topic or meaning of a document. Latent Semantic Indexing (LSI) [Deerwester et al] tries to overcome the problems of lexical matching by using statistically derived conceptual indices instead of individual words for retrieval. LSI assumes that there is some underlying or latent structure in word usage that is partially obscured by variability in word choice. A truncated singular value decomposition (SVD) is used to estimate the structure in word usage across documents. Retrieval is then performed using the database of singular values and vectors obtained from the truncated SVD. Performance data shows that these statistically derived vectors are more robust indicators of meaning than individual terms. Section 2 is a review of basic concepts needed to understand LSI. In Section 3, a description of some of the advantages and disadvantages of LSI. The effectiveness of LSI has been demonstrated empirically in several text collections as increased average retrieval precision but a theoretical (and quantitative) understanding beyond empirical evidence is desirable. Section 4 describes some of the attempts that have been done in this direction. Finally, in Section 5 some applications of LSI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Getting Better Results With Latent Semantic Indexing

The paper presents an overview of some important factors influencing the quality of the results obtained when using Latent Semantic Indexing. The factors are separated in 5 major groups and analyzed both separately and as whole. A new class of extended Boolean operations such as OR, AND and NOT (ANDNOT) and their combinations is proposed and evaluated on a corpus of religious

متن کامل

Enhancing Literature Review Methods - towards More Efficient Literature Research with Latent Semantic Indexing

Nowadays, the facilitated access to increasing amounts of information and scientific resources means that more and more effort is required to conduct comprehensive literature reviews. Literature search, as a fundamental, complex, and time-consuming step in every literature research process, is part of many established scientific methods. However, it is still predominantly supported by search te...

متن کامل

Enhancing Literature Research Processes: A Glance at an Approach Based on Latent Semantic Indexing

Literature search as a fundamental, complex and time-consuming step in a literature research process is part of many established scientific methods. It is still predominantly supported by search techniques based on conventional term-matching methods. We address the lack of semantic approaches in this context by proposing an enhancement of the literature research process with a prototype of our ...

متن کامل

Using Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis

We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. We compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. In this study we use a corpus comprising 1003 documents from the MEDLINE-corpus. Our results show that Latent Semantic Analysis on Random Index...

متن کامل

Enhancing literature Review Methods - towards More Efficient literature Research with Latent Semantic Indexing

Nowadays, the facilitated access to increasing amounts of information and scientific resources means that more and more effort is required to conduct comprehensive literature reviews. Literature search, as a fundamental, complex, and time-consuming step in every literature research process, is part of many established scientific methods. However, it is still predominantly supported by search te...

متن کامل

A Survey of Information Retrieval and Filtering Methods

We survey the major techniques for information retrieval In the rst part we provide an overview of the traditional ones full text scanning inversion signature les and clustering In the second part we discuss attempts to include semantic information natural language processing latent semantic indexing and neural networks

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001